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Bernard Jones told you about multiresolution analysis in his wavelet lectures. 
This is a pretty well formalized and self-contained area of wavelet analysis. 
Multiscale methods form a wider and less well defined area of tools and ap- 
proaches; the term is used, e.g., in numerical analysis, but the main range of 
multiscale methods are based on application of wavelets. 

In this this lecture I shall explain how to carry out multiscale morphological 
analysis of cosmological density fields. For that, we have to use wavelets to 
decompose the data into different frequency bands, to calculate densities, and 
to describe morphology. 

Let us start with wavelet transforms. 



1 Wavelet Transforms 

The most popular wavelet transforms are the orthogonal fast transforms, de- 
scribed by Bernard Jones. For morphological analysis, we need different trans- 
forms. The easiest way to understand wavelets is to start with continuous 
wavelet transforms. 

1.1 Continuous Wavelet Transform 

The basics of wavelets are most easily understood in the case of one-dimensional 
signals (time series or data along a line). The most commonly used decom- 
position of such a signal (f(x)) into contributions from different scales is the 
Fourier decomposition: 



/W = / f (x) exp(—i kx) dx 



The Fourier amplitudes f(k) describe the frequency content of a signal. They 
are not very intuitive, however, as they depend on the behaviour of a signal 



2 Enn Saar 



as a whole; e.g., if the signal is a density distribution along a line, then all 
the regions of the universe where this line passes through, contribute to it. 
Fourier modes are homeless. 

For analyzing the texture of images and fields, both scales and positions 
are important. The right tools for that are continuous wavelets. A wavelet 
transform of our signal f(x) is 

where ip(x; a, b) is a wavelet profile. Here the argument b ties the wavelet to 
a particular location, and a is the scale factor. 

In order to be interesting (and different from the Fourier modes), typical 
wavelet profiles have a compact support. Two popular wavelets are shown in 
Fig. 1 - the Morlet wavelet, and the Mexican hat wavelet (see the formulae 
in Bernard Jones lecture). The Morlet wavelet is a complex wavelet. 




Continuous wavelets are good for finding sharp edges and singularities of 
functions. An example of that is given in Fig. 2. The brightness-coded wavelet 
amplitudes (mexican-hat wavelet) for the upper-panel skyline show features 
at all scales. 

This example shows us also the information explosion inherent to continu- 
ous wavelets - a function f(x) gives rise to a two- argument wavelet amplitude 
W(a, b); the collection of continuous wavelet amplitudes is heavily redundant. 
If the wavelet is well-behaved (2), the wavelet amplitudes can be used to 
restore the original function: 

CV Jo J-oo Va \ a J a 2 
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Fig. 2. Example (continuous mexican hat wavelet) 



where 

Jo v 

(ip* is a complex conjugate of tp, and i/j is the Fourier transform of ip). This 
constant exists (C$ < oo), when 

/oo 
ip{x) dx = . (2) 
-oo 

This is the only requirement that a continuous-transform wavelet has to sat- 
isfy. Equation (1) shows that we need to know the wavelet amplitudes of all 
scales to reconstruct the original function. We also see that large-scale wavelet 
amplitudes are heavily downweighted in this reconstruction. 

1.2 Dyadic Wavelet Transform 

It is clear that the information explosion inherent in application of continuous 
wavelets has to be restrained. The obvious way to proceed is to consider 
computational restrictions. 

First, computations need to be carried out on a discrete coordinate grid 
hi. Second, if we look at Fig. 2 we see that the wavelet amplitudes change, 
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in general, smoothly with scale. This suggests that a discrete grid of scales 
could suffice to analyze the data. While the obvious choice for the coordinate 
grid is uniform, the scale grid is usually chosen logarithmic, in hope to better 
catch the scale-dependent behaviour. This discretisation generates much less 
data than the continuous transform did, only N x J, where N is the size of 
the original data (the number of grid points), and J is the number of different 
scales. The most popular choice of the scale grid is where neighbouring scales 
differ by a factor of two. Other choices are possible, of course, but this choice 
is useful in several respects, as we shall see below. 

Such a wavelet transform is called dyadic. As we saw above, any compact 
function with the zero mean could be used as an analyzing wavelet. In the 
case of a discrete wavelet transform, we, however, have a problem - can we 
restore the original signal, given all the wavelet amplitudes obtained? It is 
not clear at first sight, as the restoration integral for a continuous wavelet (1) 
contains contributions of all wavelet scales. 

The answer to that is yes, given that the frequency axis is completely 
covered by the wavelets of dyadic scales (the wavelets form a so-called frame). 
This requirement is referred to as the perfect reconstruction condition; I shall 
show a specific example below. 

Let us rewrite now the wavelet transform, for finite grids. As we have heard 
about the multiresolution analysis already, we shall try to introduce both the 
smooth and the detail part of the transform. Let the initial data be ao(m) 
(the index shows the transform order, and the argument m - the grid point). 
The smoothing operation and the wavelet transform proper are convolutions, 
so they can be written 

a.j+i(m) = hj (k)aj (m + k) , dj + i(m) = ^^gj{k)a,j(m + k) , (3) 
k fe 

where the filters hj(k) and gj{k) are different from zero for a few values of k 
around only (the wavelet transform is local). Applying recursively this rule, 
we can find wavelet transforms of all orders, starting from the data (a (-))- 

Now, we should like to be able to restore the signal, knowing the aj(-) and 
dj(-). As the previous filters were linear, restoration should also be linear, and 
we demand: 

aj(m) = hj(k)aj + i(m + k) + '^^gj(k)dj + i(m + k) . (4) 

fc fc 

It can be proved that these rules will work for any dyadic wavelet that satisfies 
the so-called unit gain condition, if the filters are upsampled (see next section). 

If you recall (or look up) Bernard Jones' lecture, you will notice that 
the rules (3,4) look the same as the rules for bi-orthogonal wavelets. The only 
difference is that for bi-orthogonal wavelets the filters have to satisfy an extra, 
so-called aliasing cancellation condition, arising because of the downsampling 
of the grid. 
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1.3 A Trous Transform 

This downsampling leads us to the next problem. The change of the frequency 
(scale) between wavelet orders when bi-orthogonal or orthogonal wavelet 
transforms are used, is achieved by downsampling - choosing every other 
data point. Applying the same wavelet filter on the downsampled data set is 
equivalent to using the twice as wide filter on the original data. Now, in our 
case the grid is not diluted, and all points participate for all wavelet orders. 
The obvious solution here is to upsample the filter - to introduce zeros be- 
tween the filter points. This doubles the filter width, and it is also useful for 
the computational point of view, as the operations count for the convolutions 
does not depend now on the wavelet order. Because of these zeros (holes), such 
a dyadic transform is called 11 a irons'" ('with holes' in French; the transform 
was introduced by French mathematicians, Holschneider et al. ([5])). 
In Fourier language, such an upsampling halves the frequency: 

hx{ui) = h (2uj) , hj(uj) = h{2 j u) , 

where u> stands for the frequency, and h(k) = ho(k), h(co) = ho(uj). Although 
we describe localized transforms, it is useful to use Fourier transforms, occa- 
sionally - convolutions are frequent in wavelet transforms, and convolutions 
are converted to simple multiplications in Fourier space. Now, in coordinate 
space, the explicit expression for the upsampled filter h(-) is: 

h j (k') = h(k)6(k' -k-2 j ) , 

saying simply that the only non-zero components of the smoothing filter for 
the order j are those that have the index k ■ 2 3 , and they are determined by 
the original filter values h(k) = h (k). Let us write now the smoothing sum 
again: 

aj+\{m) — hj(k')aj(m + k') = h(k)aj(m + 2-*k) , (5) 

k' k 

where we retained only the non-zero terms in the last sum. The last equality 
is of the form usually used in a trous transforms; the holes are returned to the 
data space again. However, the procedure is different from the bi-orthogonal 
transform, as we find the scaling and wavelet amplitudes for every grid point 
m, not for the downsampled sets only. The wavelet rule with holes reads 

d 3 +i(m) = Y,9{k)aj{m + Vie) , (6) 

k 

where, obviously, g(k) = go(k). 

Let us now construct a particular a trous transform, starting backwards, 
from the reconstruction rule (4). In multiresolution language, this rule tells 
us that the approximation (sub)space Vj where the "smooth functions" live, 
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is a direct sum of two orthogonal subspaces of the next order (the formula is 
for projections, but it follows from this fact). Let us take it in a much simpler 
way, literally, and demand: 

Oj (to) = a j+1 (to) + d j+ i (to) , (7) 

or h(k) — g(k) = <5 fe . This is a very good choice, as applying it recursively 
we get 

a (m) = aj(m) + ^dj(m) , (8) 

3 = J 

meaning that the data is decomposed into a simple sum of contributions of 
different details (wavelet orders) and the most smooth picture. As there are no 
extra weights, these detail spaces have a direct physical meaning, representing 
the life in the full data space at a given resolution. 

The condition (7) gives us at once the formula for the wavelet transform 

d j+ i(m) = a,j(m) - a J+1 (m) = ^ [5 0k - h(k)] a^m + k) , (9) 

k 

or g(k) = S Qk - h(k) . 

So far, so good. Now we have only to choose the filter h(k) to specify 
the transform. As the filter is defined by the scaling function <p(x) via the 
two-scale equation 

cj>(x/2) = 2^2h(k)cj>(x-k), (10) 

k 

meaning simply that the scaling function of the next order (note that f(x/2) 
is only half as fast as f(x)) has to obey the smoothing rule in (5) exactly (the 
space where it lives is a subspace of the lower order space). Different normal- 
izations are used; the coefficient 2 appears here if we omit extra coefficients 
for the convolution (10) in the Fourier space: 

(recall that the coordinate space counterpart of f(2u>) is f(x/2)/2). Obviously, 
not all functions satisfy the two-scale equation (10), but a useful class of 
functions that do are box splines. 

1.4 Box Splines 

Box splines are easy to obtain - an n-th degree box spline B n (x) is the result 
of n+1 convolutions of the box profile B (x) = 1, x e [0, 1] with itself. Some 
authors like to shift it, some not; we adopt the condition that the convolution 
result is centred at when n is odd, and at x = 1/2 when n is even. This 
convention gives a simple expression for the Fourier transform of the spline: 
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i . . / sin(w/2) \ . , , , „ 
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where e = 1, if n is even, and otherwise. You can easily derive the formula 
yourselves, recalling that the Fourier transform of a [—1, 1] box is the sinc(-) 
function, and using the rule for the argument shifts. 

Box splines have, just to start, several very useful properties. First, they 
are compact; in fact, they are the most compact polynomials for a given degree 
(B n (x) is a polynomial of degree n). Second, they are interpolating, 

^B j ( !C -fc) = l ) (12) 

k 

a necessary condition for a scaling function. And box splines satisfy the two- 
scale equation (10), with 

h{k) = 2-(«+D Q , (13) 

where n is the degree of the spline (sec de Boor ([2]). This formula is written 
for unshifted box splines, and here the index k ranges from to n. It is easy to 
modify (13) for centred splines; e.g., for centred box-splines of an odd degree 
n the index k ranges from — (n + l)/2 to (n + l)/2, and we have to replace k 
at the right-hand side of (13) by k + (n + 1)2. 

I do not intend to be original, and shall choose the B 3 box spline for the 
scaling function. This is the most beloved box spline in astronomical commu- 
nity; see, e.g., the monograph by Jean-Luc Starck and Fion Murtagh ([14]) 
for many examples. As any spline, this can be given by different polynomials 
in different coordinate intervals; fortunately, a compact expression exists: 

4>(x) = B 3 (x) = ^ (k - 2| 3 - 4\x - 1| 3 + 6|.t| 3 - A\x + 1| 3 + \x + 2| 3 ) . 

(14) 

This function is identically zero outside the interval [—2, 2]. Formula (13) gives 
us the filter h(k): 

h(k) = (1/16,1/4,3/8,1/4,1/16), fee [-2,2]. (15) 

In order to obtain the associated wavelet tp(-), we have to return to our recipe 
for calculating the wavelet coefficients (9). These coefficients are, in principle, 
convolutions (multiplications in Fourier space): 



d i+1 (uj) = i)(uj)ai(uj) , 



(16) 
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So, (9) gives us 

ip{u))ai{ijj) — (j)((j/2)di(u>) — (j){bj)di(uj) , 

or, 

= 4>(u/2) - 0(w) . 
For coordinate space, the above expression transforms to 

ip(x) = 2<p{2x) - 4>{x) . 
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Fig. 3. The B3 scaling function (4>) and its associated wavelet (ip) 

Both the -B3 box spline (the scaling function <j>{x) and its associated wavelet 
4>{x) are shown in Fig. 3. 

I cannot refrain from noting that the last exercise was, strictly speaking, 
unnecessary. We could have proceeded with our wavelet transform after ob- 
taining the filter h(k) (15). But is nice to know the wavelet by face and to get 
a feeling what our algorithms really do. 

Now I can also show you the Fourier transform of the wavelet (17), to 
reassure you that such a wavelet can be built and that it does not leave gaps 
in the frequency axis. We see, first, that the filter peaks at u> w tt, giving for 
its characteristic wavelength A = 2n /uj = 2 (grid units). Second, we see that 
neighbouring wavelet orders overlap in frequency. The reason for that is that 
our wavelets are not orthogonal. So we loose a bit in frequency separation, 
but gain in spatial resolution. 

Two points more: first, about normalization. Although our scaling func- 
tion 4>(x) is normalized in the right way (its integral is 1), the coefficient 
in the two-scale equation (10 is different from the conventional one, and, as 
a result, the filter coefficients hk sum to 1, not to \/2. The integral over the 
wavelet profile is zero (this ensures that ip{x) is really a wavelet), but its norm 
/ ip 2 (x)dx w 0.2345; normally it is chosen to be 1. What matters, really, is 



(17) 
(18) 




X 
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Fig. 4. The Fourier transforms of the three subsequent orders of the i?3(-)-associated 
wavelet. The transforms fully cover the frequency axis, but the overlap between 
different orders is substantial 

that it is not zero and does not diverge; welcome to the wavelet world of free 
normalization. 

Second, about initial data. We can recursively apply the transformation 
rules (5, 6) only if we assume that the data ao(m) belongs to the class of 
smooth functions (those obtained by convolution with the scaling function). 
So, if the raw data comes in ticking at grid points (regular times), we should 
smooth it once before starting our transform chain. If the raw data is given at 
points x that do not coincide with the grid, the right solution is to distribute it 
to the grid m with the scaling weights 4>(x—m). This procedure was christened 
'extirpolation' (inverse interpolation) by Press et al. ([9]); a strange fact is that 
N-body people extirpolate all the time in their codes, but nobody wants to 
use the term. 

So, we have specified all the recipes needed to perform the a trous trans- 
form. Before we do that, we have to answer the question - why? Bernard Jones 
demonstrated us how well orthogonal wavelet transforms work. An orthogo- 
nal wavelet transform changes a signal (picture) of N pixels into exactly N 
wavelet amplitudes, while an a trous transform expands it into NxJ pictures; 
why bother? 

The reason is called 'translational invariance'. As many of the wavelet 
amplitudes of an orthogonal transform do not have an exact home, then, 
when shifting the data, these amplitudes change in strange ways. Sure, we 
can always use them to reconstruct the shifted picture, but it makes no sense 
to compare the wavelet amplitudes of the original and shifted pictures. All 
the a trous transforms, in the contrary, keep their amplitudes, these move 
together with the grid. This is 'translational invariance', and it is important 
in texture studies, where we want to see different scales of a picture at exactly 
the same grid point. And cosmic texture is the main subject of this lecture. 

A point to note - Fourier transforms are not translation invariant, too. 
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1.5 Multi-Dimensional A TVous 

All the above discussion was devoted to one-dimensional wavelets. This is 
customary in wavelet literature, as the step into multi-dimensions is simple 
- we form direct products of independent one-dimensional wavelets, one for 
every coordinate. This has been the main approach up to now, although it 
does not work well everywhere. An important example is a sphere, where 
special spherical wavelets have to be constructed (I suppose that these will be 
explained in the CMB-lectures). 

So, two-dimensional wavelets are introduced by defining the 2-D scaling 
function 

<t>{x,y) = <l>{x)4>{y) , (19) 
and three-dimensional wavelets - by the 3-D scaling function 

4>(x,y,z) = 4>(x)<l>(y)<t>(z) . (20) 

A little bit extra care has to be taken to define wavelets; we have to step into 
Fourier space for a while for that. Recalling (17), we have to write for two 
dimensions 

1p(u>l,U>2) = </>(wi/2)0(w 2 /2) - <}>(uJi)<f>(W2) 

(the direct products (19, 20) look exactly the same in the Fourier space). For 
coordinate space, it gives 

il>(x,y)=ty(2x)<l>(2y)-<t>(x)<t>(y) , 

and for three dimensions, respectively 

4>{x,y,x) = %(j){2x)(l){2y)4>{2z) - <f>(x)<f>(y)4>(z) , 

I show the ^-associated wavelets in Figs. 5 and 6. Because of their def- 
inition, the wavelet profiles are symmetric, but not isotropic, right? A big 
surprise is that both the scaling functions and the wavelets are practically 
isotropic, as the figures hint at. Let us define the isotropic part of the 2-D 
wavelet as 

1 f 27r 

% P{ r ) — 7T~ / ip(r cos a, r sin a) da , 
2n Jo 

and estimate the deviation from isotropy by 

p2 pi 

j / \i>(x,y) - i)(\Jx 2 + y 2 )\ dxdy . 

J -2 J -2 

Comparing e with the integral over the absolute value of our wavelet itself 
(about 4/9), we find that the difference is about 2%. For three dimensions, 
the difference is a bit larger, up to 5%. 

This isotropy is important for practical applications; it means that our 
choice of specific coordinate directions does not influence the results we get. 




Fig. 6. Three-dimensional ^-associated wavelet 



And, as an example, I show a sequence of transforms in Fig. 7. Those are 
slices of a 3-D ^-associated a trous transform sequence for the gravitational 
potential image of a N-body simulation. The data slice is at the upper left; the 
left column shows the scaling solutions, higher orders up. The right column 
shows the wavelet amplitudes; the data can be restored by taking the lower 
left image, and by adding to it all the wavelet images from the right column. 
Simple, is it? 
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Fig. 7. A trous potential (slices) for a N-body model (linear 
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2 Cosmological Densities 

In cosmology, the matter density distribution is a very important quantity 
that, first, determines the future dynamics of structure, and, second, may 
carry traces of the very early universe (initial conditions). We, however, cannot 
observe it. The data we get, after enormous effort, gives us galaxy positions 
in (rcdshift) space; even if we learn to associate a proper piece of dark matter 
with a particular galaxy, we have a point process, not a continuous density 
field. 

There are several ways to deal with that. First, we have to introduce a 
coordinate grid; an extra discrete entity, but necessary when using continuous 
fields in practical computations. 

2.1 N-Body Densities 

A similar density estimation problem arises in N-body calculations. The fun- 
damental building blocks there are mass points; these points are moved around 
every timestep, and in order to get the accelerations for the next timestep, 
the positions of the points have to be translated to the mass density on the 
underlying grid. 

The simplest density assignment scheme is called NGP (nearest grid point), 
where the grid coordinate i is found by rounding the point coordinate x (to 
keep formulae simple, consider onc-dimcnsional case and grid step one; gener- 
alization is trivial). Thus, the NGP assignment law is i = floor(x + 0.5). This 
scheme is pretty rough, but I have seen people using an even worse scheme, 
i = fioor(x). 

A bit more complex scheme is CIC (cloud in cell) where the mass point is 
dressed in a cubic cloud of the size of the grid cell, and vertices's are dressed in 
a similar region of influence. The part of the cloud that intersects this region 
of influence is assigned to the vertex. Sounds complicated, but as the mass 
weights are obtained by integration over the constant-density cloud, it is, in 
fact, only linear extirpolation (in 3-D, three linear). 

The most complex single-cloud scheme used is TSC (triangular shaped 
cloud), where the density of the cloud changes linearly from a maximum in 
the centre, to zero at the borders. The mass integration needed ensures that 
this scheme is quadratic extirpolation. 

Note that the last two mass assignment schemes are, if fact, centred B- 
splinc assignments - the vertex region is Bq, the CIC density is Bq, too, and 
the TSC cloud is B\ . The weights are obtained by convolution of those density 
profiles with the vertex profile, so, finally, NGP is a B extirpolation scheme 
(no density law to be convolved with), CIC becomes a B\ extirpolation scheme 
and TSC - a B 2 scheme. 

Today's mass assignment schemes are mostly adaptive variations of those 
listed above, where more dense grids are built in regions of higher density. But 
the N-body mass assignment schemes share one property - mass conservation. 
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No matter what scheme is used, the total mass assigned to all vertexes of the 
grid is equal to the total mass of the mass points. No surprise in that - box 
splines are interpolating (12). 

2.2 Statistical Densities 

This class of density assignments (estimators) does not care about mass con- 
servation. The underlying assumption is that we observe a sample of events 
that are governed by an underlying probability density, and have to estimate 
this density. In cosmology, there really is no difference between the two den- 
sities, spatial mass density and probability density. 

The basic model of galaxy distribution adopted by cosmological statistics 
is that of the Cox point process. It says that, first, the universe is defined by 
a realization of a random process that ascribes a probability density A(x) in 
space. Then, a Poisson point process gets to work, populating the universe 
with galaxies, where the probability to have a galaxy at x is given by the Pois- 
son law with the parameter A(x) fixed by the initial random-field realization. 
Neat, right? And as we are dealing with random processes, no conservation is 
required. 

Statisticians have worked seriously on probability density estimation prob- 
lem (see Silverman [13] for a review). The most popular density estimators 
are the kernel estimators: 

Q i = Y J K{x n -i) (21) 

n 

(recall that i are the grid coordinates and are the galaxy coordinates). The 
kernel if is a symmetric distribution: 

J K{x)dx = 1 , J xK(x) = . 

Much work has been done on the choice of kernels, with the result that the 
exact shape of the kernel does not matter much, but its width does. The best 
kernel is said to be the Epanechikov kernel: 

K E (x) = A(l - yi 2 /R 2 ) , x 2 < R 2 , otherwise . (22) 

(I wrote it for multidimensional case to stress that this is not a direct-product, 
but an isotropic kernel; A is, of course, a normalization constant). The Gaus- 
sian kernel comes close behind: 

K G (x) = -^cxp(-x 2 /2<j 2 ) . (23) 

This is the only kernel where the direct product is isotropic, too. The ranking 
of kernels is done by deciding how close the estimated probability density f{x) 
is to the true density f(x), by measuring the MSE (mean standard error): 



Multiscale Methods 15 



MSE = £ f(x)-f(x) = Var 



(/») 



+ Bias 



(/»)] 2 • (24) 



Note that statisticians minimize the MSE, not only the variance, as cosmolo- 
gists frequently tend to do. As usual in statistics, the results are asymptotic, 
true for a very big number of galaxies N. As I have tested, in the usual 
cosmological case where there are about 10 galaxies inside the Epanechikov 
kernel, the total mass over all vertexes is only a half of the galaxy number TV; 
Epanechikov cheats. It starts working properly for about 100 points inside the 
kernel. In this respect, the B3 kernel we used above, is is a very good candidate 
for a density estimation kernel. It is smooth (meaning its Fourier transform 
decays fast), and it is compact, no wide wings as the Gaussian kernel has. 
And it is interpolating, guaranteeing that not a single galaxy is lost. 

Kernel density estimators allow a natural generalization for the case of 
extremely different density amplitudes and scales, as seen in cosmology. 
Constant-width kernels tend to over-smooth the sharp peaks of the density, 
if these exist. The solution is using adaptive kernels, by varying the kernel 
width h(-) from place to place. There are, basically, two different ways to do 
that. The balloon or scatter estimators say: 



here the kernels sit on the grid points i. The second type of estimators is called 
sample point, sandbox, or gather estimators: 



Here the kernel width depends on the sample point. The most difficult problem 
for adaptive kernels is how to choose the right kernel widths. The usual way 
is to estimate the density with a constant kernel first, and to select the adap- 
tive kernel widths proportional to some fractional power of the local density 
obtained in the first pass (—1/5 is a recommended choice). 

Both estimators are used in cosmology (the terms scatter and gather come 
from the SPH cosmological hydrodynamics codes). The lore says that the bal- 
loon estimators (25) work best in low-probability regions (voids in cosmology), 
and the sandbox estimators - where densities are high. 

2.3 Equal-Mass Densities 

A popular density estimator is based on k-d trees. These trees are formed by 
recursive division of the sample space into two equal-probability halves (having 
the same number of galaxies). It is a spatial version of adaptive histograms 
(an equal number of events per bin). Of course, k-d trees give more than 
just density estimates, they also imprint a tree structure on (or reveal the 




(25) 
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structure of the geometry of) the density field. An application of k-d trees for 
estimating densities appeared in astro-ph during the school, and has already 
been published by the time of writeup of the lecture (Ascasibar & Binney [1]). 

Another popular equal-mass density estimators are kNN (k nearest neigh- 
bours) kernels. The name speaks for itself - the local kernel size is chosen to 
include k particles in the kernel volume. This estimator uses isotropic kernels. 

The SPH gather algorithm uses, in fact, the kNN ideology. There is a sep- 
arate free density estimation tool based on that algorithm ('smooth'), written 
by Joachim Stadel and available from the Washington University N-body 
shop 3 . Try it; the only problems are that you have to present the input data 
in the 'tipsy' format, and that you get the densities at particle positions, not 
on a grid. Should be easy to modify, if necessary. 

2.4 Wavelet Denoising 

Wavelet denoising is a popular image processing methodology. The basic as- 
sumption is that noise in an image is present at all scales. Once we accept that 
assumption, the way to proceed is clear: decompose the image into separate 
scales (wavelet orders; orthogonal wavelet transforms are the best here), esti- 
mate the noise at each wavelet order, eliminate it somehow, and reconstruct 
the image. 

This course of actions includes two difficult points - first, estimating the 
noise. The properties of the basically unknown noise are, ahem, unknown, and 
we have to make assumptions about them. Gaussian and Poisson noise are the 
most popular assumptions; this leaves us with the problem of relative noise 
amplitudes (variances) between different wavelet orders. A popular method is 
to model the noise. Modelling is started by assuming that all the first-order 
wavelet data is noise (interesting, is it?) and processing that for the noise vari- 
ance. After that, noise of that variance is modelled, wavelet transformed, and 
its properties found for every wavelet order. After that, we face a a common 
decision theory problem, at which p-value have to set the noise limit? If we 
cut the noise at too low amplitude, we leave much of it in the final image, and 
if we take the cut too high, we eliminate part of the real signal, too. Once we 
have selected that level, we can quantify it in the terms of the limiting wavelet 
amplitude kaj, where a? is the modelled noise variance for the level j. 

The second problem is how to suppress the noisy amplitudes. The first 
approach is called 'hard thresholding' and it is simple: the processed wavelet 
amplitudes uij are 

uij = Wj , if \iVj\ > kcfj , otherwise . (27) 

This thresholding leaves an empty trench around in the wavelet amplitude 
distribution. 

3 ttp : / /www- pec. astro.washington.edu/ 
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Another approach is 'soft thresholding': 

Wj — Wj — sgn(wj) k<jj , if \vjj \ > k<jj , otherwise . (28) 

This thresholding takes out the same trench, first, but fills it up then, dimin- 
ishing all the remaining amplitudes. 

David Donoho, who was the first to introduce soft thresholding, has also 
proposed an universal formula for the threshold level: 

k<jj = \/2 log(n) <jj , 

where n is the number of pixels in the image. This level corresponds to 3a 
for n = 90, and to 4<r for n = 3000. Of course, astronomers complain - 3000 
pixels is a very small size for an astronomical image, but 4tr is a very high cut- 
off level; we can cut off much of the information in the image. Astronomical 
information is hard to obtain, and we do not want to waste even a bit. So we 
better keep our pictures noisy? Fortunately, more approaches to thresholding 
have appeared in recent years; consult the new edition of the Starck and 
Murtagh book ([14]). 

Anyway, wavelet denoising has met with resounding success in image pro- 
cessing, no doubts about it. And image processing is an industry these days, 
so the algorithms that are used are being tested in practise every day. Now, 
image processing is 2-D business; wavelet denoising of a 3-D (spatial) density 
is a completely different story. The difference is that the density contrasts 
are much bigger in 3-D than in 2-D - there simple is more space for the 
signal to crowd in. And as wavelets follow the details, they might easily over- 
amplify the contrasts. I know, we have spent the last year trying to develop 
a decent wavelet denoising algorithm for the galaxy distributions (well, 'we' 
means mainly Jean-Luc Starck) . Fig. 8 shows how the denoising might go awry 
(right panel), but shows at the same time that good recipes are also possible 
(left panel). The denoising procedure for the right panel has over-amplified 
the contrasts, and has generated deep black (zero density) holes close to white 
high density peaks. So, in order to do a decent denoising job, one has to be 
careful. 

The details of the algorithm we used are too tedious to describe in full, 
they can be found in Starck and Murtagh ([14]).. The main points are: 

1. We used the a trous algorithm, not an orthogonal one. The reason for 
that is that we needed to discriminate between the positions of significant 
wavelet amplitudes (the multiresolution support) and non-significant am- 
plitudes at the last stage, and when speaking of positions, orthogonal 
wavelet transforms cannot be used. 

2. We hard thresholded the solution and iterated it, reconstructing and trans- 
forming again, to obtain a situation where the final significant wavelet 
coefficients would cover exactly the original multiresolution support. 

3. Finally, we smooth the solution by imposing a smoothness constraint, 
requiring that the sum of the absolute values of wavelet amplitudes of 
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Fig. 8. Wavelet denoising of a (model) galaxy distribution (left - a successful at- 
tempt, right - over-denoising) 

a given order would be minimal, while keeping the corrected amplitudes 
themselves within a given tolerance (aj/2) of the original noisy ones. We 
consider here only the amplitudes in the multiresolution support; this 
point required using a translation invariant wavelet transform. 

Fig. 9 compares the results of our 3-D wavelet denoising. We started with a 
3-D galaxy distribution, applied our algorithm to it, and, for comparison, built 
two Gaussian-smoothed density distributions. It is clearly seen that the typical 
details, in the case of Gaussian smoothing, are of uniform size, while the 
wavelet-denoised density distribution is adaptive, showing details of different 
scales. 

2.5 Multiscale Densities 

We have made an implicit assumption during all this section, namely, that a 
true density field exists. Is that so certain? Our everyday experience tells us 
that it is. But look at the numbers that stand behind this experience: even 
one cubic centimetre of air has 6 x 10 23 / 22.4 x 10 3 w 3 x 10 19 particles. In our 
surveys, one gigantic cosmological 'cubic centimetre' of lOMpc size contains 
about 10 galaxies. Can we speak about their true spatial density? 

One answer is that we can, but there are regions where the density esti- 
mates are extremely uncertain; statistics can tell us what the expected vari- 
ances are. Another answer is that even if there is a true density, it is not 
always a useful physical quantity, especially for the largest scales we study. 
One of the reasons we measure the cosmological density field is to find its state 
of evolution and traces of initial conditions in it. The theory of the dynamics 
of perturbations in an expanding universe predicts that structure evolves at 
different rates, slowly at large scales and much more rapidly at galactic scales. 
Observations show that cosmological fields are multiscale objects; the recently 
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Fig. 9. Density fields for a model galaxy distribution. Left - Gaussian a = 1 Mpc/h 
smoothing, middle - wavelet denoising, right - Gaussian a — 3Mpc/fo smoothing 

determined power spectra span scales from about 600Mpc/fr (k = 0.01 h/Mpc 
to 10Mpc//i (k = 0.6/i/Mpc. Thus, should not these fields be studied in mul- 
tiscale fashion, scale by scale? A true adaptive density mixes effects from dif- 
ferent scales and scale separation could give us a cleaner look at the dynamics 
of large-scale structure. 

In case of our everyday densities, this separation of scales can be done 
safely later, analyzing the true density. For galaxy distributions, it is wiser to 
prescribe a scale (range) and to obtain that density directly from the observed 
galaxy positions. One advantage is in accuracy, another in that there are places 
(voids), where, e.g., small-scale densities simply do not exist. 

And this is the point where the first part of the lecture (wavelets) con- 
nects with the second part (density fields). The representation of the observed 
density fields by a sum of the densities of different characteristic scales (7) is 
just what we are looking for. True, there is a pretty large frequency overlap 
between the neighbouring bands (Fig. 10 shows you their power spectra), but 
that is possibly the best we can do, while keeping translational invariance. 
The figure shows also the power spectra of Gaussians of the same scales that 
are sometimes used to select different scales. As we see, Gaussian frequency 
bands are heavily correlated, the overlap of the smaller frequency band with 
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that of the higher one is total, and that is natural - smoothing destroys sig- 
nals of high frequency, but it does not separate frequency bands. So Gaussians 
should not be used in this business, but, alas, they frequently are. 

0.25 



0.2 
0.15 

0- 

0.1 
0.05 



2 4 6 8 10 

CO 

Fig. 10. Power spectra for two neighbouring scales, the ^-associated a trous bands 
(solid lines), and for Gaussian smoothing of the same scales /(dotted lines). Only 
the positive half on the frequency axis is shown 

Figs. 11 and 12 show the a trous density slices for a Gaussian cube. This is 
a 256 3 realization of a Gaussian random field with a power spectrum approx- 
imating that of our universe. In real universe, the size of the cube would be 
256Mpc//i. The slices are taken from the same height; The images are in gray 
coding, black shows the densest regions. As in Fig. 7, the scaling solutions 
form the left column and the wavelets - the right column; transform orders 
grow downwards. In height, the wavelet orders are placed between the scaling 
order that produced them. The scaling solution of order three is repeated in 
Fig. 12 to keep the scaling-wavclct alignment. Enjoy. 

3 Minkowski Functionals 

Peter Coles explains in his lecture why it is useful to study morphology of 
cosmological fields. In short, it is useful because it is sort of a perpendicular 
approach to the usual moment methods. Our present cosmological paradigm 
says that the initial perturbation field was a realization of a Gaussian random 
field. The most direct test of that would be to measure all n-point joint am- 
plitude distributions, starting from the 1-point distribution. Well, we know 
that even this is not Gaussian, but we know why (gravitational dynamics of a 
positive density field inevitably skews this distribution), and we can model it. 
As cosmological densities are pretty uncertain, the more uncertain are their 
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Fig. 11. A trous density (slices) of a Gaussian density field, (linear scale). The first 
orders (0-3 for the scaling solutions, 1-3 for the wavelets) 
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Fig. 12. A trous density (slices) of a Gaussian density field, (linear scale). The 
large-scale orders (3-6 for the scaling solutions, 4-6 for the wavelets) 
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many-point joint distributions. So this direct check does not work, at least 
presently. 

Another possibility to check for Gaussianity is to estimate higher order 
correlation functions and spectra. For Gaussian realizations, odd-order corre- 
lations and power spectra should be zero, and even-order moments should be 
directly expressible via the second-order moments, the usual two-point corre- 
lation function and the power spectrum. Their dynamical distortions can also 
be modelled, and this is an active area of research. 

Morphological studies provide an independent check of Gaussianity. Mor- 
phology of (density) fields depends on all correlation functions at once, is 
scale-dependent, but local, and can also be predicted and its change caused 
by dynamical evolution can be modelled. This lecture is, finally, about mea- 
suring morphology of cosmological fields. 

An elegant description of morphological characteristics of density fields 
is given by Minkowski functionals [8]. These functionals provide a complete 
family of morphological measures - all additive, motion invariant and condi- 
tionally continuous 4 functionals defined for any hypersurface are linear com- 
binations of its Minkowski functionals. 

The Minkowski functionals (MF for short) describe the morphology of iso- 
density surfaces, and depend thus on the specific density level 5 . Of course, 
when the original data are galaxy positions, the procedure chosen to calculate 
densities (smoothing) will also determine the result. The usual procedure used 
in this business is to calculate kernel densities with wide Gaussian kernels; the 
recipes say that the width of the kernel (standard deviation) should be either 
the mean distance between galaxies or their correlation length, whichever is 
larger. Although this produces nice smooth densities, the recipe is bad, it 
destroys the texture of the density distribution; I shall show it later. 

We shall use wavelets to produce densities, and shall look first at the tex- 
ture of a true (wavelet-denoised) density, and then at the scale-dependent 
multiscale texture of the galaxy density distribution. We could also start di- 
rectly from galaxies themselves, as Minkowski functionals can be defined for 
a point process, decorating the points with spheres of the same radius, and 
studying the morphology of the resulting surface. This approach does not re- 
fer to a density and we do not use it here. Although it is beautiful, too, the 
basic model that it describes is a (constant-density) Poisson process; a theory 

4 Note added during final revision: We submitted recently a paper on multiscale 
Minkowski functionals, and the referee wondered what does 'conditionally con- 
tinuous' mean. So, now I know - they are continuous if the hypersurfaces are 
compact and convex, and we can approximate any decent hypersurface by unions 
of such. 

5 In fact, Minkowski functionals depend on a surface, that is why they are called 
functionals (functions of functions). When we specify the family of iso-density 
surfaces, the functionals will depend, suddenly, only on a number, the value of 
the density level, and are downgraded to simple functions, at least in cosmological 
applications. 
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for that case exists, and analytical expressions for Minkowski functionals are 
known. Alas, as the galaxy distribution is strongly correlated, this reference 
model does not help us much. The continuous density case has a reference 
model, too, and that is a Gaussian random field, so this is more useful. 

For a d-dimensional space, one can find d + 1 different Minkowski func- 
tionals. We shall concentrate on usual 3-D space; for that, the Minkowski 
functionals are defined as follows. Consider an excursion set F<p of a field 
<^>(x) in 3-D (the set of all points where density is larger than a given limit, 
<H X — </>o))- Then, the first Minkowski functional (the volume functional) is 
the volume of this region (the excursion set): 

V (<f>o) = / d 3 x. (29) 

Jf *o 

The second MF is proportional to the surface area of the boundary SF^ of the 
excursion set: 

Vi(<h) = \l dS(x) , (30) 

(but not the area itself, notice the constant). The third MF is proportional to 
the integrated mean curvature of the boundary: 

vM 'lL^ + «^) dS{x> ' (31) 

where i?i(x) and i?2(x) are the principal curvatures of the boundary. The 
fourth Minkowski functional is proportional to the integrated Gaussian cur- 
vature (the Euler characteristic) of the boundary: 

nv-hUiW)"™- (32) 

The last MF is simply related to the genus that was the first morphological 
measure used in cosmology; all these papers bear titles containing the word 
'topology'. Well, the topological Euler characteristic \ f° r a surface in 3D can 
be written as 

X=^ [ *dS, (33) 



s 



where k is the Gaussian curvature, so 



V 3 = \x ■ (34) 

Bear in mind, though, that the Euler characteristic (33) describes the topology 
of a given iso-density surface, not of the full 3-D density distribution; the 
topology of the latter is, hopefully, trivial. 
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The first topology papers concentrated on the genus G that is similar to 

X = 2(1 - G) , V 3 = l-G. (35) 

The functional V3 is a bit more comfortable to use - it is additive, while G 
is not, and in the case our surface breaks up into several isolated balls, V 3 is 
equal to the number of balls. If the excursion set contains only a few isolated 
empty regions (bubbles), V3 gives their number. In a general case 

V3 = #-of-balls + #-of-bubblcs — #-of-tunnels , 

where only these tunnels that are open at both ends, count. 

I have to warn you about a possible confusion with the genus relations 
(33-35) - the coefficient 2 (or 1/2) occupies frequently a wrong position. 
The confusion is due to a fact that two topological characteristics can be 
defined for an excursion set - one for its surface, another for the set itself. 
The relation between these depends on the dimensionality of the space; for 
3-D the topological characteristic for the excursion set is half of that for the 
surface, and if we mix them up, our formulae become wrong. I know, we have 
published a wrong formula, too (even twice), but the formulae arc right in our 
book [6]. So, bear in mind that the Minkowski functionals are calculated for 
surfaces, and use only the relations above (33-35). When in doubt, consult 
the classical paper by Mecke et al. [8], and use the Crofton's formula below 
(42) for a single cubic cell - it gives you V 3 = 1 => G = 0. 

Fig. 13 shows a Gaussian cube (a realization of a Gaussian random process) 
for two different smoothing widths (the left pair and the right pair of columns, 
respectively) , and for three volume fractions. You can see that the solid figures 
inside the isodensity surface are awash with handles, especially at the middle 
50% density level. Of course, the larger the smoothing, the less the number of 
handles. You can also see that Gaussian patterns are symmetric - the filled 
regions are exact lookalikes of the empty regions, for a symmetric change of 
volume fractions. 

Galaxy densities are more asymmetrical, as seen in Fig. 14. This figure 
shows a model galaxy distribution from a N-body simulation, in a smaller 
cube. The 50% density volumes differ, showing asymmetry in the density 
distribution, and the 5% - 95% symmetry, evident for the Gaussian cube, is 
not so perfect any more. 

Instead of the functionals, their spatial densities Vi are frequently used: 

Vi(f) = Vi(f)/V , i = 0,...,3, 

where V is the total sample volume. The densities allow us to compare the 
morphology of different data samples. 

3.1 Labelling the Isodensity Surfaces 

The original argument of the functionals, the density level go, can have differ- 
ent amplitudes for different fields, and the functionals are difficult to compare. 
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The left two columns show isodensity surfaces for a — 3 pixels, the right two columns 
for a = 8 pixels. To better delineate isodensity surfaces, we show two sides of the 
surface in column pairs, where the left column shows high-density regions, and the 
right column - low-density regions for the same isodensity surface. The rows are for 
constant volume fractions (7%, 50%, and 93%), starting from below 



Because of that, normalized arguments are usually used; the simplest one is 
the volume fraction f v , the ratio of the volume outside of the excursion set 
to the total volume of the region where the density is defined (the higher the 
density level, the closer this ratio is to 1). Another, similar argument is the 
mass fraction f m , which is very useful for real, positive density fields, but 
is cumbersome to apply for realizations of Gaussian fields, where the den- 
sity may be negative. But when we describe the morphology of single objects 
(superclusters, say), the mass fraction is the most natural argument. It is 
also defined to approach 1 for the highest density levels (and for the smallest 
masses inside the isodensity surface). 

The most widely used argument in this business is the Gaussianized volume 
fraction v, defined as 

1 f°° 

fv = -j= / cxp(-< 2 /2) dt . (36) 
V ^ Jv 
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Fig. 14. A galaxy sample (60 3 pixels) for a 3-pixel smoothing scale. The left column 
shows high-density regions, and the right column - low-density regions for the same 
isodensity surface. The rows are for constant volume fractions (7%, 50%, and 93%), 
starting from below 



This argument was introduced already in the first topology paper by Gott 
[4], in order to eliminate the first trivial effect of gravitational clustering, the 
deviation of the 1-point pdf from the (supposedly) Gaussian initial pdf. Notice 
that using this argument, the first Minkowski functional is trivially Gaussian 
by definition. 

For a Gaussian random field, v is the density deviation from the mean, 
divided by the standard deviation. We can define a similar argument for any 
field: 



I show different Minkowski functionals versus different arguments in 
Figs. 15 and 16. They are calculated for the model galaxy density distri- 
bution shown in the previous figure (Fig. 14). Note how much the shape of 
the same function(al)s depends on the arguments used. 
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Fig. 15. The first two Minkowski functionals for N-body model galaxies. Here Vf is 
the volume fraction, m/ - the mass fraction, v a - the normalized volume fraction, 
and vg = v - the Gaussianized volume fraction from (36). The dotted line in the 
right panels shows the predicted Minkowski functionals for a Gaussian random field 
MF 



3.2 Gaussian Densities 

All the Minkowski functionals have analytic expressions for iso-density slices 
of realizations of Gaussian random fields. For three-dimensional space they 
arc: 

2 A ( v 2 \ 

,;2 = 3^ exp r^J ' (39) 

*3 = ^ 2 -Dexp(4), (40) 



where <£(•) is the Gaussian error integral, and A is determined by the correla- 
tion function £(r) of the field: 

A 2 - 1 e(0) (41) 

x ^iw • (41) 



The dimension of A is inverse length. 
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Fig. 16. The last two Minkowski functionals for N-body model galaxies. Here Vf is 
the volume fraction, m/ - the mass fraction, v a - the normalized volume fraction, 
and vg = v - the Gaussianized volume fraction from (36). The dotted line in the 
right panels shows the predicted Minkowski functionals for a Gaussian random field 
MF 



This expression allows to predict all the Minkowski functionals for a known 
correlation function (or power spectrum). We can also take a more empirical 
approach and to determine A 2 on the basis of the observed density field itself, 
using the relations £(0) = (g 2 ) and £"(0) = (g, 2 ), where g }i is the density 
derivative in one coordinate direction. 

The expected form of these functionals is shown in Fig. 17. 

In practice, it is easy to obtain good estimates of the Minkowski functionals 
for periodic fields. The real data, however, is always spatially limited, and the 
limiting surfaces cut the iso-density surface. An extremely valuable property of 
Minkowski functionals is that such cuts can be corrected for - the data volume 
mask is another body, and Minkowski functionals of intersecting bodies can 
be calculated. Moreover, if we can assume homogeneity and isotropy for the 
pattern, we can correct for border effects of large surveys. This is too technical 
for the lecture, so I refer to our forthcoming paper [10]. 

3.3 Numerical Algorithms 

Several algorithms are used to calculate the Minkowski functionals for a given 
density field and a given density threshold. We can either try to follow ex- 
actly the geometry of the iso-density surface, e.g., using triangulation, or to 
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Fig. 17. Gaussian predictions for Minkowski functionals A = 1 



approximate the excursion set on a simple cubic grid. The first algorithms 
were designed to calculate the genus only; the algorithm that was proposed 
by Gott, Dickinson, and Melott [4] uses a decomposition of the field into filled 
and empty cells, and another class of algorithms (Coles, Davies, and Pear- 
son [3]) uses a grid- valued density distribution. The grid-based algorithms are 
simpler and faster, but are believed to be not as accurate as the triangulation 
codes. 

I advocate a simple grid-based algorithm, based on integral geometry 
(Crofton's intersection formula), proposed by Schmalzing and Buchert ([12]). 
It is similar to that of [3], and for V 3 (genus) coincides with it. 

To start with, we find the density thresholds for given filling fractions / by 
sorting the grid densities. This is a common step for all grid-based algorithms. 
Vertexes with higher densities than the threshold form the excursion set. This 
set is characterized by its basic sets of different dimensions - points (vertexes), 
edges formed by two neighbouring points, squares (faces) formed by four edges, 
and cubes formed by six faces. The algorithm counts the numbers of elements 
of all basic sets, and finds the values of the Minkowski functionals as 

V = a 3 N 3 , 

/2 4 2 
V 2 = a {- Nl -N 2 + -N 3 

V 3 =N -N 1 + N 2 -N 3 , 
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where a is the grid step, N is the number of vertexes, N x is the number of 
edges, N-2 is the number of squares (faces), and N 3 is the number of basic 
cubes in the excursion set. 

This algorithm described above is simple to program, and is very fast, 
allowing the use of Monte-Carlo simulations for error estimation 6 . 

The first, and most used, algorithm was 'CONTOUR' and its derivations; this 
algorithm was written by David Weinberg and was probably one of the first 
cosmological public-domain algorithms [15]. However, there is a noticeable 
difference between the results that our grid algorithm produces and those of 
'CONTOUR'. indexalgorithm:CONTOUR You can see it yourself, comparing the 
Minkowski functional figures in this lecture with Fig. 4 in Peter Coles' chapter. 
All genus curves ever found by 'CONTOUR' look like that there, very jaggy. 
How much I have tried, using Gaussian smoothing and changing smoothing 
lengths, I have never been able to reproduce these jaggies by our algorithm. 
At the same time, the calculated genus curves always follow the Gaussian 
prediction, so there is no bias in either algorithm. As the genus (V3) counts 
objects (balls, holes, handles), it should, in principle, have mainly unit jumps, 
and only occasionally a larger jump. That is what we see in our genus (V3) 
graphs. The only conclusion that 1 can derive is that the algorithm we use is 
more stable, with a much smaller variance. 

3.4 Shapefinders 

As Minkowski functionals give a complete morphological description of sur- 
faces, it should be possible to use them to numerically describe the shapes 
of objects, right? For example, to differentiate between fat (spherical) objects 
and thin (cylindrical objects), banana- like superclusters and spiky superclus- 
ters. This hope has never died, and different shape descriptors have been pro- 
posed (a selection of them is listed in our book [6] ) . The set of shape descrip- 
tors that use Minkowski functionals was proposed by Sahni, Sathyaprakash 
and Shandarin [11], and is called 'shapefinders'. Now, shapefinder definitions 
have a habit of changing from paper to paper and it is not easy to follow 
the changes, so I shall give here a careful derivation of the last version of 
shapefinders. Shapefinders are defined as ratios of numbers that are similar to 
Minkowski functionals, but only close, in 3-D, the chain of Minkowski func- 
tionals Vi, from Vo to V3, has gradually diminishing dimensions, from L 3 to 
L°. So, the ratios of neighbours in this chain have a dimension of length; these 
ratios make up the first set of shapefinders. The proportionality coefficients 
are chosen to normalize all these ratios to R for a sphere of the radius R. 
Take a surface that is delimiting (shaping) a three-dimensional object (e.g., a 
supercluster). The definitions of the first three Minkowski functionals (29-31) 
can be rewritten as 

6 A thorough analysis of the algorithm and its application to galaxy distributions 
will be available, by the time this book will be published ([10]). 
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V = V = ^-R\ 

S = 6V1 = AnR 2 
C = 3nV 2 = R , 



where the second equalities stand for a sphere of a radius R, V is the volume, 
S - the surface, and C - the mean integrated radius of curvature: 

C= [U^ + -A^)dS. (43) 



s 2 \Ri(y) i? 2 (x) 

Whatever formulae you see in the literature, do not use them before checking 
with the definition below - these are the true shapefinders: 

3V 1 V 

#! = — = -— thickness , (44) 



H 2 = ^ = ~, breadth, (45) 

O 7T V 2 

H s = ^- = -V 2 , length . (46) 

(47) 



Only this normalization gives you Hi = R for a sphere; check it. The descrip- 
tive names you see were given by the authors. There is a fourth shapefinder - 
the genus has been given the honour to stand for it, directly. As genus counts 
"minus" things (minus the number of isolated objects, minus the number of 
holes), the fourth Minkowski functional V3 should be a better candidate. 

The first set of shapefinders is accompanied by the 'second order' shapefind- 
ers 



K\ = — — , planarity , (48) 



H 2 + H 1 

^-2 = 77 -rr , filamentarity . (49) 

H 2 + H\ 

(50) 



These five (six, if you count the genus) numbers describe pretty well the shape 
of smooth (ellipsoidal) surfaces. In this case, the ratios K\ and K 2 vary nicely 
from to 1, and another frequently used ratio, Ki/K 2l has definite trends 
with respect to the shape of the object. But things get much more interesting 
when you start calculating the shapefinders of real superclusters; as shapes 
get complex, simple meanings disappear. Also, as shapefinders are defined as 
ratios of observationally estimated numbers (or ratios of quadratic combina- 
tions, as Hi work out in terms of Vi), they are extremely noisy. So, in case 
of serious use, a procedure should be developed to estimate the shapefinders 
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directly, not by the ratio rules (44, 48); such a procedure does not exist yet. 
But, for the moment, the shapefinders are probably the best shape description 
tool (for cosmology) we have. 

3.5 Morphology of Wavelet-Denoised Density 

So much for the preliminaries. Now we have all our tools (wavelets, densities, 
Minkowski functionals). Let us use them and see what is the morphology of 
the real galaxy density distribution. 

This question has been asked and answered about a hundred times, start- 
ing from the first topology paper by Gott et al [4]. The first data set was a 
cube from the CfA I sample and contained 123 galaxies, if I remember right. 
(Imagine estimating a spatial density on the basis of a hundred points; this 
is the moment that Landau's definition of a cosmologist is appropriate: "Cos- 
mologists are often wrong, but never in doubt" .) The answer was - Gaussian ! ; 
ten points for courage. The same optimistic answer has been heard about a 
hundred times since (in all the papers published), with slight corrections in 
later times, as data is getting better. These corrections have been explained 
by different observational and dynamical effects, and peace reigns. 

But - all these studies have carefully smoothed the galaxy catalogues by 
nice wide Gaussian kernels to get a proper density. Doubts have been expressed 
that one Gaussianity could lead to another (by Peter Coles, for example), and 
the nice Gaussian behaviour we get is exactly that we have built in in the 
density field. So, let us wavelet-denoise the density field (a complex recipe, 
but without any Gaussians), and estimate the Minkowski functionals. The 
next two figures are from our recent work ([7]). 

The data are the 2dFGRS Northern galaxies; we chose a maximum- volume 
one-magnitude interval volume-limited (constant density) sample and cut a 
maximum- volume brick from it (interesting, every time I say 'maximum', the 
sample gets smaller). Constant density is necessary, otherwise typical density 
scales will change with distance, and a brick was cut in order to avoid border 
corrections (these can bring additional difficulties for wavelet denoising). We 
carefully wavelet-denoised the density; Fig. 18 shows its fourth Minkowski 
functional. 

First, we see that our wavelet. denoised density is never close to Gaussian. 
Secondly, the specially built N-body catalogues (mocks) do not have Gaussian 
morphology, too; although they deviate from the real data, they are closer to 
it than to the example Gaussian. We see also that the same galaxies, smoothed 
by a yet very narrow Gaussian kernel (a = 2 Mpc/h), show an almost Gaussian 
morphology. Thus, the clear message of this figure is that the morphology of 
a good (we hope the best) adaptive galaxy density is far from Gaussian; and 
the Gaussianity-confirming results obtained so far are all the consequence of 
the Gaussianity input by hand - Gaussian smoothing. This figure exhibits a 
little non-Gaussianity yet; the next one (Fig. 19) almost does not. The filter 
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Fig. 18. The Minkowski functional Vz for the 2dF brick, for the wavelet denoised 
data set (thick solid line). The variability range of the wavelet denoised mocks is 
shown with bars. We show also the 95% confidence limits for 1300 realizations of 
theoretical Gaussian density fields (dashed lines), and the Vz data curve (thin solid 
line), all obtained for the Gaussian a = 2 Mpc/ft smoothing 

used there is still narrow (er = 4Mpc//i); the filter widths used in most papers 
are around twice that value. 

I am pretty sure that Gaussian smoothing is the culprit here. We built 
completely non-Gaussian density distributions (Voronoi walls and networks), 
Gaussian smoothed them using popular recipes about the kernel size, and 
obtained perfectly Gaussian Minkowski functionals. 

One reason for that, as Vicent Martinez has proposed, is that the severe 
smoothing used changes a density field into Poissonian, practically. Try it - 
smooth a density field with a Gaussian of a = ro, where ro is the correlation 
length, and you get a density field with a very flat low-amplitude correlation 
function, almost Poissonian. And the Minkowski functionals of Poissonian 
density fields 7 are Gaussian; we tested that. 

Another reason that turns Minkowski functionals Gaussian even for small 
<t, must be the extended wings of Gaussian kernels. Although they drop pretty 
fast, they are big enough to add to a small extra ripple on the main density 
field. As Minkowski functionals are extremely sensitive to small density vari- 
ations, then all they see is that ripple. This is especially well seen in initially 
empty regions. Usually, one uses a FFT-based procedure for Gaussian smooth- 

7 To be more exact, here is the recipe - take a Poissonian point process of N points 
in a volume V and smooth it with a Gaussian kernel of a = d, where d = (V/N) 1 * 13 
is the mean distance between the points. 
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Fig. 19. The Minkowski functional V3 for the 2dF brick, Gaussian smoothed with 
a — 4Mpc//i (thick solid line). The variability range of the mocks is shown with 
bars. We show also the 95% confidence limits for 1300 realizations of theoretical 
Gaussian density fields (dashed lines). 

ing, as that is at least a hundred times faster (for present catalogue volumes) 
that direct convolution in real space. This procedure generates a wildly fluc- 
tuating small-amplitude density field in empty regions, and we did not realize 
for a long time at first where those giant-amplitude ghost MF-s came from. 
Gaussian smoothing and FFT, that was their address. 

3.6 Multiscale Morphology 

So, the galaxy density similar to that we have accustomed to find in our 
everyday experience, is decidedly non-Gaussian. But is that a problem? Cos- 
mological dynamics tells us that structure evolves at different rate at different 
scales; a true density mixes these scales all together and is not the best object 
to search for elusive traces of initial conditions. Good. Multiscale densities to 
the rescue. 

The results that will end this chapter did not exist at the time of the 
summer school. But we live in the present, and time is short, so I will include 
them. A detailed account is already accepted for publication ([10]), I shall 
show only a collection of Minkowski functionals here. 

As our basic data set, we took the 2dFGRS volume-limited samples for 
the [—20,-19] magnitude interval; they have the highest mean density among 
similar one-magnitude interval samples. We did not cut bricks this time, but 
corrected for sample boundaries; we have learnt that by now. We wavelet- 
decomposed the galaxy density fields and found the Minkowski functionals; 




36 Enn Saar 



as simple as that. Although wavelet decompositions are linear and should not 
add anything to the morphology of the fields, we checked that on simulated 
Gaussian density fields. Right, they do not add any extra morphological signal. 

The results (for the 2dFGRS North) arc shown in Figs. 20-22. As the 
amplitudes of the (densities of) Minkowski functionals vary in a large interval, 
we use a sign-aware logarithmic mapping: 

logn(x; a) = sgn(x) log(l + |x|/a) . 

This mapping accepts arguments of both signs (log(x) does not), is linear for 
|x| << a and logarithmic for |x| >> a. As the figures show, for the scales 
possible to study, the morphology of the galaxy density distribution is decid- 
edly not Gaussian. The deviations are not too large for the second Minkowski 
functional u 2 (watch how the maxima shift around), but are clearly seen for 
the two others. The maximum wavelet order here is 3 for a v^Mpc/Zi grid, 
that corresponds to a characteristic scale of 2 3 ^/2 w 11.3Mpc//i. As the mean 
thickness of the 2dFGR North slice is about 40-50 Mpc//i, we cannot go much 
further - the higher order wavelet slices would be practically two-dimensional. 
The 2dFGRS Southern dataset has similar size limitations. So, as 10Mpc//i is 
a scale where cosmological dynamics might have slight morphological effects, 
the question if the original morphology of the cosmological density field was 
Gaussian, is not answered yet. But we shall find it out soon, when the SDSS 
will finally fill its full planned volume. 




-2-1012 

v 

Fig. 20. Summary of the densities of the second MF vi for the data and all wavelet 
orders for the 2dFN19 sample, in the logn mapping. Thick lines show reference 
Gaussian predictions. Full lines stand for the lMpc/fe grid, dotted lines - for the 
V^Mpc/Zi grid 
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Fig. 21. Summary of the densities of the third MF V2 for the data and all wavelet 
orders for the 2dFN19 sample, in the logn mapping. Thick lines show reference 
Gaussian predictions. Full lines stand for the 1 Mpc/h grid, dotted lines - for the 
V^Mpc/Zi grid 
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Fig. 22. Summary of the densities of the fourth MF vs for the data and all wavelet 
orders for the 2dFN19 sample, in the logn mapping. Thick lines show reference 
Gaussian predictions. Full lines stand for the 1 Mpc/h grid, dotted lines - for the 
^2Mpc//i grid 
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These are the results, for the moment. The morphology of the galaxy 
density field is far from Gaussian, in contrary to practically every earlier result. 
Have all these results only confirmed that if we take a serious smoothing effort, 
we are able to smooth any density field to Poissonian? An indecent thought. 

But do not be worried, results come and go, this is the nature of research. 
Methods, on the contrary, stay a little longer, and this school was all about 
methods. 



Recommended Reading 

I was surprised that Bernard Jones recommended a wavelet bookshelf that 
is almost completely different from mine (the only common book is that of 
I. Daubechies'). So, read those of Bernard's choice, and add mine: 

• five books: 

1. Stcphane Mallat, A Wavelet Tour of Signal Processing, 2nd ed., Aca- 
demic Press, London, 1999, 

2. C. Sidney Burrus, Ramesh A. Gopinath, Haitao Gao, Introduction to 
Wavelets and Wavelet Transforms, Prentice Hall, NJ, 1997, 

3. Ingrid Daubechies, Ten lectures on wavelets, SIAM, Philadelphia, 2002, 

4. Jean-Luc Starck, Fionn Murtagh, "Astronomical Image and Data Anal- 
ysis", 2nd ed., Springer, 2006 (application of wavelets and many other 
wonderful image processing methods in astronomy), 

5. Bernard W. Silverman, Density Estimation for Statistics and Data 
Analysis, Chapman & Hall / CRC Press, Boca Raton, 1986 (the clas- 
sical text on density estimation), 

• three articles: 

1. Mark J. Shcnsa, The Discrete wavelet Transform: Wedding the A Trous 
and Mallat Algorithms, IEEE Transactions on Signal Processing, 40, 
2464-2482, 1992 (the title explains it all), 

2. K.R. Mecke, T. Buchert, H. Wagner, Robust morphological measures 
for large-scale structure in the Universe, Astron. Astrophys. 288, 697- 
704, 1994 (introducing Minkowski functionals in cosmology), 

3. Jens Schmalzing, Thomas Buchert, Beyond Genus Statistics: A Unified 
Approach to the Morphology of Cosmic Structure, Ap. J. Letts 482, 
LI, 1997, (presentation of two grid algorithms). 

• two web pages: 

1. a wavelet tutorial by Jean-Luc Starck at (http : // j starck . free . f r) , 

2. the wavelet pages by David Donoho at http: //www-stat . Stanford . 
edu/~donoho/ (look at lectures and reports). 
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